72 research outputs found
Very fast optimal bandwidth selection for univariate kernel density estimation
Most automatic bandwidth selection procedures for kernel density estimates require estimation of quantities involving
the density derivatives. Estimation of modes and inflexion points of densities also require derivative estimates. The
computational complexity of evaluating the density derivative at M evaluation points given N sample points from the density is O(MN). In this paper we propose a computationally efficient -exact approximation algorithm for univariate, Gaussian kernel based, density derivative estimation that reduces the computational complexity from O(MN) to linear order (O(N+M)). The constant depends on the desired arbitrary accuracy, . We apply the density derivative evaluation procedure to estimate the optimal bandwidth for kernel density estimation, a process that is often intractable for large data sets. For example for N = M = 409,600 points while the direct evaluation of the density derivative takes around 12.76 hours the fast evaluation requires only 65 seconds with an error of around 10^{-12). Algorithm details, error bounds, procedure to choose the parameters and numerical experiments are presented. We demonstrate the speedup achieved on the bandwidth selection using the ``solve-the-equation plug-in method'' [18]. We also demonstrate that the proposed procedure can be extremely useful for speeding up exploratory projection pursuit techniques
POSITION CALIBRATION OF ACOUSTIC SENSORS AND ACTUATORS ON DISTRIBUTED GENERAL PURPOSE COMPUTING PLATFORMS
An algorithm is presented to automatically determine the relative 3D positions of audio sensors and actuators in an ad-hoc distributed network of heterogeneous general purpose computing platforms. A closed form approximate solution is derived, which is further refined by minimizing a non-linear error function. Our formulation and solution accounts for the lack of temporal synchronization among different platforms. We also derive an approximate expression for the mean and covariance of the implicitly defined estimator. The theoretical performance limits for the sensor positions are derived and analyzed with respect to the number of sensors and actuators as well as their geometry. We report extensive simulation results and discuss the practical details of implementing our algorithms
Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks
The stream of words produced by Automatic Speech Recognition (ASR) systems is
typically devoid of punctuations and formatting. Most natural language
processing applications expect segmented and well-formatted texts as input,
which is not available in ASR output. This paper proposes a novel technique of
jointly modeling multiple correlated tasks such as punctuation and
capitalization using bidirectional recurrent neural networks, which leads to
improved performance for each of these tasks. This method could be extended for
joint modeling of any other correlated sequence labeling tasks.Comment: Accepted in Interspeech 201
DeepSolarEye: Power Loss Prediction and Weakly Supervised Soiling Localization via Fully Convolutional Networks for Solar Panels
The impact of soiling on solar panels is an important and well-studied
problem in renewable energy sector. In this paper, we present the first
convolutional neural network (CNN) based approach for solar panel soiling and
defect analysis. Our approach takes an RGB image of solar panel and
environmental factors as inputs to predict power loss, soiling localization,
and soiling type. In computer vision, localization is a complex task which
typically requires manually labeled training data such as bounding boxes or
segmentation masks. Our proposed approach consists of specialized four stages
which completely avoids localization ground truth and only needs panel images
with power loss labels for training. The region of impact area obtained from
the predicted localization masks are classified into soiling types using the
webly supervised learning. For improving localization capabilities of CNNs, we
introduce a novel bi-directional input-aware fusion (BiDIAF) block that
reinforces the input at different levels of CNN to learn input-specific feature
maps. Our empirical study shows that BiDIAF improves the power loss prediction
accuracy by about 3% and localization accuracy by about 4%. Our end-to-end
model yields further improvement of about 24% on localization when learned in a
weakly supervised manner. Our approach is generalizable and showed promising
results on web crawled solar panel images. Our system has a frame rate of 22
fps (including all steps) on a NVIDIA TitanX GPU. Additionally, we collected
first of it's kind dataset for solar panel image analysis consisting 45,000+
images.Comment: Accepted for publication at WACV 201
Fast Computation of Sums of Gaussians in High Dimensions
Evaluating sums of multivariate Gaussian kernels is a key computational task in many problems in computational statistics and
machine learning. The computational cost of the direct evaluation of such sums scales as the product of the number of kernel
functions and the evaluation points. The fast Gauss transform proposed by Greengard and Strain (1991) is a -exact
approximation algorithm that reduces the computational complexity of the evaluation of the sum of Gaussians at points
in dimensions from to . However, the constant factor in grows
exponentially with increasing dimensionality , which makes the algorithm impractical for dimensions greater than three. In
this paper we present a new algorithm where the constant factor is reduced to asymptotically polynomial order. The reduction
is based on a new multivariate Taylor's series expansion (which can act both as a local as well as a far field expansion)
scheme combined with the efficient space subdivision using the -center algorithm. The proposed method differs from the
original fast Gauss transform in terms of a different factorization, efficient space subdivision, and the use of point-wise
error bounds. Algorithm details, error bounds, procedure to choose the parameters and numerical experiments are presented.
As an example we shows how the proposed method can be used for very fast -exact multivariate kernel density
estimation
Fast Computation of Kernel Estimators
The computational complexity of evaluating the kernel density estimate (or its derivatives) at m evaluation points given n sample points scales quadratically as O(nm)—making it prohibitively expensive for large datasets. While approximate methods like binning could speed up the computation, they lack a precise control over the accuracy of the approximation. There is no straightforward way of choosing the binning parameters a priori in order to achieve a desired approximation error. We propose a novel computationally efficient ε-exact approximation algorithm for the univariate Gaussian kernel-based density derivative estimation that reduces the computational complexity from O(nm) to linear O(n+m). The user can specify a desired accuracy ε. The algorithm guarantees that the actual error between the approximation and the original kernel estimate will always be less than ε. We also apply our proposed fast algorithm to speed up automatic bandwidth selection procedures. We compare our method to the best available binning methods in terms of the speed and the accuracy. Our experimental results show that the proposed method is almost twice as fast as the best binning methods and is around five orders of magnitude more accurate. The software for the proposed method is available online
- …